6 research outputs found

    Cross-Domain Multitask Model for Head Detection and Facial Attribute Estimation

    Get PDF
    Extracting specific attributes of a face within an image, such as emotion, age, or head pose has numerous applications. As one of the most widely used vision-based attribute extraction models, HPE (Head Pose Estimation) models have been extensively explored. In spite of the success of these models, the pre-processing step of cropping the region of interest from the image, before it is fed into the network, is still a challenge. Moreover, a significant portion of the existing models are problem-specific models developed specifically for HPE. In response to the wide application of HPE models and the limitations of existing techniques, we developed a multi-purpose, multi-task model to parallelize face detection and pose estimation (i.e., along both axes of yaw and pitch). This model is based on the Mask-RCNN object detection model, which computes a collection of mid-level shared features in conjunction with some independent neural networks, for the detection of faces and the estimation of poses. We evaluated the proposed model using two publicly available datasets, Prima and BIWI, and obtained MAEs (Mean Absolute Errors) of 8.0 ± 8.6, and 8.2 ± 8.1 for yaw and pitch detection on Prima, and 6.2 ± 4.7, and 6.6 ± 4.9 on BIWI dataset. The generalization capability of the model and its cross-domain effectiveness was assessed on the publicly available dataset of UTKFace for face detection and age estimation, resulting a MAE of 5.3 ± 3.2. A comparison of the proposed model’s performance on the domains it was tested on reveals that it compares favorably with the state-of-the-art models, as demonstrated by their published results. We provide the source code of our model for public use at: https://github.com/kahroba2000/MTL_MRCNN

    BioGAN: An unpaired GAN-based image to image translation model for microbiological images

    Get PDF
    Background and objective: A diversified dataset is crucial for training a well-generalized supervised computer vision algorithm. However, in the field of microbiology, generation and annotation of a diverse dataset including field-taken images are time-consuming, costly, and in some cases impossible. Image to image translation frameworks allow us to diversify the dataset by transferring images from one domain to another. However, most existing image translation techniques require a paired dataset (original image and its corresponding image in the target domain), which poses a significant challenge in collecting such datasets. In addition, the application of these image translation frameworks in microbiology] is rarely discussed . In this study, we aim to develop an unpaired GAN-based (Generative Adversarial Network) image to image translation model for microbiological images, and study how it can improve generalization ability of object detection models. Methods: In this paper, we present an unpaired and unsupervised image translation model to translate laboratory-taken microbiological images to field images, building upon the recent advances in GAN networks and Perceptual loss function. We propose a novel design for a GAN model, BioGAN, by utilizing Adversarial and Perceptual loss in order to transform high level features of laboratory-taken images of Prototheca bovis into field images, while keeping their spatial features. Results: We studied the contribution of Adversarial and Perceptual loss in the generation of realistic field images. We used the synthetic field images, generated by BioGAN, to train an object-detection framework, and compared the results with those of an object-detection framework trained with laboratory images; this resulted in up to 68.1% and 75.3% improvement on F1score and mAP, respectively. We also present the results of a qualitative evaluation test, performed by experts, of the similarity of BioGAN synthetic images with field images

    A crowdsourcing semi-automatic image segmentation platform for cell biology

    Get PDF
    State-of-the-art computer-vision algorithms rely on big and accurately annotated data, which are expensive, laborious and time-consuming to generate. This task is even more challenging when it comes to microbiological images, because they require specialized expertise for accurate annotation. Previous studies show that crowdsourcing and assistive-annotation tools are two potential solutions to address this challenge. In this work, we have developed a web-based platform to enable crowdsourcing annotation of image data; the platform is powered by a semi-automated assistive tool to support non-expert annotators to improve the annotation efficiency. The behavior of annotators with and without the assistive tool is analyzed, using biological images of different complexity. More specifically, non-experts have been asked to use the platform to annotate microbiological images of gut parasites, which are compared with annotations by experts. A quantitative evaluation is carried out on the results, confirming that the assistive tools can noticeably decrease the non-expert annotation�s cost (time, click, interaction, etc.) while preserving or even improving the annotation�s quality. The annotation quality of non-experts has been investigated using IOU (intersection of union), precision and recall; based on this analysis we propose some ideas on how to better design similar crowdsourcing and assistive platforms

    An Investigation into Generating High-quality, Diversified Datasets of Microbiological Images for Supervised Computer Vision Models

    No full text
    Supervised deep neural networks need datasets for training, in which the data need to be annotated before use. For developing a reliable deep neural network, the datasets should meet some criteria including high-quality annotation, diversity, and abundance of data. Generation of such datasets is costly and time-consuming, especially in the case of image datasets. This is due to reasons including inaccessibility to large-scale and diverse images, as well as the laborious process of image annotation. These problems are exacerbated in the medical domain since medical image collection is more expensive, and their annotation requires in-depth domain knowledge. Thus, big data and high-quality annotation are two of the most difficult challenges in annotation of medical images, not to mention ethical considerations. The computer vision community has put forward a lot of effort to tackle these challenges, e.g., by using computer techniques for synthetically generating low-cost (economically, time-wise, etc) images or using computer techniques to facilitate the annotation process. Despite intensive efforts, many aspects of the domain and solutions remain understudied. For example, in crowdsourcing, which is a common way of generating rapid and cost-effective annotation, there is the risk of having lowskilled annotators, which degrades the annotation quality. Moreover, the tedious nature of some annotation tasks can detrimentally affect annotators’ quality in the prolonged annotation processes (even for the skilled workers). Thus, in this Ph.D. thesis, some of these challenges were comprehensively explored and some solutions, focusing on three studies outlined as follows were proposed to bridge these gaps. First, as the prerequisite of this Ph.D. thesis, a web-based annotation platform was developed for image datasets annotations, powered by a crowdsourcing tool that has been utilized for the forthcoming studies. This platform is now available online at www.aiconsole.com. Furthermore, a dataset of microbiological images of three different parasite groups were collected and annotated by the biologist research partners. In the first study, we compared the performance of an AI-based assistive tool to help annotators (also known as crowd workers or crowd annotators in crowdsourcing context) with microbiological image annotation with that of manual annotation. To accomplish this, the web-based annotation platform was integrated with a novel assistive tool (based on a weakly trained object detection model), and a two-day experiment (i.e. with using and not using assisitive tool, respectivly) with crowd workers was conducted in two modes: i) AI-based assistive annotation and ii) manual annotation. A set of quantitative evaluations were conducted in order to assess the annotators' behaviour and the assistive tool's performance. Overall, the results showed how this assistive tool based on a weakly trained object detection model can decrease the annotation cost (measured by time and number of clicks). Derived from the findings of this study, some recommendations on how future platforms with the same assistive tool can be designed to more engage the annotators to the task for a better performance are provided. Due to the lack of more conclusive results related to annotators' behaviour, and fatigues effect on annotators' performance, the platform was upgraded with additional tools to address other research questions in the next study. The second study, aimed to answer three research questions. i) How crowd workers' performance changes over time when involved in a prolonged task ii) feasibility of assessing annotators’ fatigue and performance via annotation-based and mouse-based features iii) assessing a new aggregation technique to combine crowd workers annotations with respect to their annotations’ estimated quality. In this study, we found an increase and decrease in annotators’ performance (as measured by the Dice Similarity Coefficient; DSC) as a function of learning and fatigue effects whereas workers in the learing region gained experience resulting in better performance, while in the fatigue region their performance detoriated. A set of extracted annotation-related and mouse-related features demonstrated a strong correlation with the workers' quality and fatigue level, which motivated the creation of regression models for estimating workers' performance. Additionally, we proposed a new Weighted Majority Voting (WMV) method for aggregating annotations that takes into account the estimated quality of each individual annotation. In comparison with the benchmark aggregation techniques (conventional majority voting and STAPLE), the new aggregation method showed a relative improvement in the mean and variance of DSCs. The third study, tackled the lack of diversity in microbiology image datasets by developing a GAN-based image-to-image translation model (BioGAN) for converting microbiology images, taken in the lab into images with the visual characteristics of images taken in the field. This study was motivated by the fact that collecting microbiological images in the field is not as simple and affordable as lab-based image collection. By adding a Perceptual loss (including two elements of Content reconstruction loss and Style reconstruction loss) to the Adversarial loss of a classical GAN network, the difference between high-level (texture) features of the synthetic image and a real-world field image has been penalised. Then, the proposed BioGAN model was tested on its ability to translate laboratory-taken images of Prototheca into field-like images, using experts’ qualitative evaluation and quantitative evaluation by the Mask R-CNN object detection framework. We found that the generated images helped to boost diversity as well as the volume of the dataset. In synthetically generated images, the spatial characteristics remain the same (i.e., the cells remain in the same position with the same dimension), which means that the annotations for the lab-taken images are valid and usable for synthetic field images, which reduces the cost of annotation. These findings and developed models extended theoretical and practical knowledge in the area of medical image annotation, creating a low-cost but high-quality image dataset for supervised computer vision models based on neural networks. Specifically, the contribution lies in i) providing AI-based tools for computer vision practitioners and researchers to generate cost-effective yet high-quality annotations on image datasets, ii) developing a set of guidelines to help developers design better crowdsourcing platforms, iii) understanding users' behaviour and interactions in crowdsourcing environments, iv) aggregating annotations from crowdsourcing workers more effectively, v) the potential use of a GAN model for enhancing the diversity of image datasets. Also, as one of the major practical contributions of this PhD, the crowdsourcing image annotation platform, and the codes for the image translation model have been published for use by practitioners

    An EMG-based Eating Behaviour Monitoring System with Haptic Feedback to Promote Mindful Eating

    No full text
    Mindless eating, or the lack of awareness of the food we are consuming, has been linked to health problems attributed to unhealthy eating behaviour, including obesity. Traditional approaches used to moderate eating behaviour often rely on inaccurate self-logging, manual observations or bulky equipment. Overall, there is a clear unmet clinical need to develop an intelligent and lightweight system which can automatically monitor eating behaviour and provide feedback. In this paper, we investigate: i) the development of an automated system for detecting eating behaviour using wearable Electromyography (EMG) sensors, and ii) the application of the proposed system combined with real-time wristband haptic feedback to facilitate mindful eating. For this, the collected data from 16 participants were used to develop an algorithm for detecting chewing and swallowing. We extracted 18 features from EMG which were presented to different classifiers, to develop a system enabling participants to self-moderate their chewing behaviour using haptic feedback. An additional experimental study was conducted with 20 further participants to evaluate the effectiveness of eating monitoring and haptic interface in promoting mindful eating. We used a standard validation scheme with a leave-one-participant-out to assess model performance using standard metrics (F1-score). The proposed algorithm automatically assessed eating behaviour accurately using the EMG-extracted features and a Support Vector Machine (SVM): F1-Score=0.95 for chewing classification, and F1-Score=0.87 for swallowing classification. The experimental study showed that participants exhibited a lower rate of chewing when haptic feedback was delivered in the form of wristband vibration, compared to a baseline and non-haptic condition (F (2,38) = 58.243, p <.001). These findings may have major implications for research in eating behaviour, providing key insights into the impact of automatic chewing detection and haptic feedback systems on moderating eating behaviour towards improving health outcomes
    corecore